Reproducibility in systems biology modelling
نویسندگان
چکیده
Commentary23 February 2021Open Access Reproducibility in systems biology modelling Krishna Tiwari European Molecular Biology Laboratory, Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK Babraham Institute, Research Search for more papers by this author Sarubini Kananathan Matthew G Roberts Johannes P Meyer Mohammad Umer Sharif Shohan Ashley Xavier Matthieu Maire Ahmad Zyoud Jinghao Men Szeyi Ng Tung V N Nguyen Mihai Glont Henning Hermjakob Corresponding Author [email protected] Beijing of Lifeomics, National Center Protein Sciences (The Phoenix Center), Beijing, China Rahuman S Malik-Sheriff orcid.org/0000-0003-0705-9809 Information Tiwari1,2, Kananathan1, Roberts1, Meyer1, Shohan1, Xavier1, Maire1, Zyoud1, Men1, Ng1, Nguyen1, Glont1, *,1,3 and *,1 1European 2Babraham 3Beijing *Corresponding author. E-mail: Systems (2021)17:e9982https://doi.org/10.15252/msb.20209982 PDFDownload PDF article text main figures. ToolsAdd to favoritesDownload CitationsTrack CitationsPermissions ShareFacebookTwitterLinked InMendeleyWechatReddit Figures & Info scientific results is a key element science credibility. The lack reproducibility across many fields has emerged as an important concern. In piece, we assess mathematical model propose scorecard improving field. A survey 1,576 scientists published Nature (Baker, 2016) reported that over 70% the participants failed reproduce others' experiments 50% their own results. It was assumed would remain relatively untouched crisis, models are specific set computational codes representing well-defined equations perform reproducible simulations. However, from number articles were shown not simulation described (Mendes, 2018). To identify major causes failure, systematically analysed 455 conjunction with curation process BioModels repository. Remarkably, about half either due incorrect or missing information manuscript. We 8-point modellers, reviewers journal editors address crisis. Experimental fail tests several reasons including improper documentation methodology, considering noise positive finding, unrecognized incomplete experimental variables, data fabrication bias publishing premature 2016; Munafò et al, 2017). Computational research also faces issues, compounded factors changes reference and/or formats, software versions essential methodology (Schnell, 2018; Papin 2020). Several suggestions have been improve bioinformatics (Kim involves representation biological processes investigate complex system behaviours, which cannot be studied looking at individual components (Le Novère, 2015). While analyses could factors, typically result inadvertent error order it therefore critical pinpoint precise models’ failure. Moreover, how prevalent modelling. (https://www.ebi.ac.uk/biomodels/) one largest public open-source databases quantitative models, where manually curated semantically enriched. Here, present systematic analysis reproducibility, coordinated BioModels, attempting independently total, investigated ordinary differential equation (ODE) various processes. These randomly sampled but selected based on BioModels’ priorities driven funding, collaborations, curators’ interest direct submissions BioModels. randomized, our sample covered wide range life taken 152 journals. represent 20% all literature-based available database. Model assessment manual two-step process: encoding standard formats reproducing figures manuscript followed semantic enrichment its (Malik-Sheriff Semantic done following MIRIAM guidelines involved annotation entities (species, reactions, parameters, events, etc.) cross-references controlled vocabularies such GO (Gene Ontology), ChEBI, Mathematical Modelling Ontology, Brenda Tissue Ontology Factor well resources UniProt, Ensemble, NCBI Taxonomy Reactome. steps employed reproducibility: carefully read, encoded SBML format. When previously submitted format, equations, values parameters initial concentration, perturbation etc. cross-verified simulations files performed predominantly using COPASI (http://copasi.org/). used original manuscript, other SimBiology toolbox (MATLAB) (https://www.mathworks.com/), libSBMLsim (https://fun.bio.keio.ac.jp/software/libsbmlsim/) Mathematica (https://www.wolfram.com/mathematica/) considered when reproduced least associated different figure, time-course plot without phase-plane plot, should match any minor deviation still acceptable if did affect conclusion study. directly description labelled “Directly Reproducible”. parameter provided article, resorted empirical trial approach correct curator expertise. For example, terms added model; potential typos misplacement decimal points corrected. after corrections “Reproduced corrections”. authors failed, yet potentially salvageable contacted possible responses recorded. them support”. Models “Non-reproducible” likely plausible non-reproducibility include (i) inconsistency structure, i.e. equation, (ii) values, (iii) concentration (iv) unknown reason. representations ones models. 49% kinetic among 389 format remaining 66 those hence they cross-checked ensure whether conditions accurately represented. There no limitations encode simulate simple ODE 233 out (51%) (Fig 1A). About This high proportion unexpected exposed serious issue within Even only 37 directly. full list link respective code status, https://www.ebi.ac.uk/biomodels/reproducibility. Figure 1. (A) reproduced. (B) 12% support. (C) year publication (D) Distribution size (number entities/species) (top) percentage (directly further efforts) non-reproducible (bottom) specified bins size. (E) reproducible, efforts (empirical correction support) journals than two work. Download figure PowerPoint effort involving careful support 1B). Forty (9%) successfully inaccurate reporting Some common errors identified corrected sign e.g. negative production term vice versa; equations—e.g. depletion definition; mistakes points, value 0.01 place 0.001; some concentrations inferred time point plots; (v) units nmol/l misrepresented µmol/l. always estimate expression. these cases, corresponding request seek clarification. feasible, authors’ change institutions, field, leaving academia death. attempted contact 90 less third (27) responded. who responded (13 3% total models) subsequently mostly last 5 years 1C). Surprisingly respond provide Our included sizes (3–200 entities) 1D), 1E). 63% ultimately reproduced, combining efforts. why 37% (n = 169) rescued even reason 99) 52), 44) structure 36), combination aforementioned causes. Among 99 three reasons; 19 conditions; six inconsistent structure; four structure. Yet, large 70) failure unclear. manuscripts might factors. report form plots, straightforward extract values. Insufficient content reproduce. commonly overlooked peer-review reflected Take-home messages recommendations Overall, indicated examined Although discussed community 2020), expected adverse. Given inability widespread exception few 1E), imperative revisit studies suggested Reproducibility, replicability repeatability terminologies often confused defined differently (Mi?kowski context modelling, refined definition (also referred repeatability) ability use same results, whereas build de novo expressions correctly represented originally used. focus work latter chosen criterion. challenges faced unambiguous inference variable entity name description, expression code, becomes challenging simulation. “alpha” may refer completely entities. managed overcome challenge reading strongly recommend making file self-contained proper written programming languages MATLAB, python, C R helpful model. Nevertheless, easily comprehensible, especially commented. split, notable fraction modellers COMBINE SBML, SED-ML Archive consistent framework annotate both human machine-readable. strong makes highly interoperable 280 supporting tools construction, simulation, visualization processing layer. disseminate greatly enhance comprehend most approaches kinetic, constraint-based, logic agent-based specifically focused type observed deterministic Other types delay, partial stochastic equations; affected extent somewhat compared appeared first reasons, suggesting underestimate. (i.e. model) Mann–Whitney U-test MATLAB function rank sum showed distribution significantly P-value < 0.0001. Obviously, smaller slightly larger ones. significant small 1D). case constraint-based flux resulting balance unique solutions tool MEMOTE developed primarily quality control currently collaborating develop procedures test Similarly, engaged (CALM). time-intensive task. On average, took week thoroughly single 2 days cases weeks. criterion consider reasonable compromise between reliably assessing huge additional required relevant One classified error. curators team, will consult each particular non-reproducibility. curator, works curating literature, model, then scientist occasionally tries literature opinion unlikely fare better curator. Further, authors, rate similar, 29 (44%) Thus, believe misclassification error, while zero, alter conclusions analysis. intended call-out rather current status behind reproducibility. Thereby, intend raise awareness researchers make contribution towards addressing “curated” section source reliable, verified keeping accessible. versioning allow version accessible part record. Currently, keep “non-curated” contains awaiting curation, do detailed curation. explicitly label because, hand, there chance and, want discourage through aware explicit labelling cause others try again. During open discussion dedicated breakout session 2020 (http://co.mbine.org/), recommended transparent flexible Leveraging lessons learned interaction communities, (Box 1). consists items help another modeller effort. scorecard. eight questions unit score “yes” answer. All applicable hence, scale 8, advocate get 4 least. Box Are manuscript/supplementary material? levels listed (as table) software/programming environment, algorithm, parameters/concentration/states normalization under attached supplementary code(s) shared publicly? archive, syntactically validated? deposited database? documented unambiguously entities/variables? (with expressions, conditions, relevant.) enriched, annotated Gene ChEBI database Ontologies? numerical publicly along codes? Total Score (out 8) 22% (99 reasons: miss
منابع مشابه
Graphical Modelling in Genetics and Systems Biology
Graphical modelling in its modern form was pioneered by Lauritzen and Wermuth [43] and Pearl [55] in the 1980s, and has since found applications in fields as diverse as bioinformatics [28], customer satisfaction surveys [37] and weather forecasts [1]. Genetics and systems biology are unique among these fields in the dimension of the data sets they study, which often contain several thousand var...
متن کاملReproducibility and cell biology
Growing concerns about the reproducibility of published research threaten to undermine the scientific enterprise and erode public trust. Conscientious application of “best practices” for the generation and reporting of research, along with post-publication access to raw data and other research materials, will protect the integrity of the research literature. Research reproducibility is an incre...
متن کاملChEBI for systems biology and metabolic modelling
ChEBI (http://www.ebi.ac.uk/chebi) is a curated database and ontology of biologically relevant small molecules. It is widely used as a reference for chemicals in the context of biological data such as protein interactions, pathways, and models (Hastings et al., 2013). As of the last release (May 2015), ChEBI contains 44,263 fully curated entries, each of which is classified within one of the su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Molecular Systems Biology
سال: 2021
ISSN: ['1744-4292']
DOI: https://doi.org/10.15252/msb.20209982